LPAS: High Efficiency Load Balancing Parallel Data Mining Algorithm

نویسندگان

  • Sheng-Hui Liu
  • Li-Wei Zhou
  • Kun-Ming Yu
چکیده

Association rule discovery plays an important role in knowledge discovery and data mining, and efficiency is especially crucial for an algorithm finding frequent itemsets from a large database. Many methods have been proposed to solve this problem. In addition, parallel computing has been a popular trend, such as on cloud platform, grid system or multicore platform. In this paper, a high efficiency load balancing parallel data mining method based on Apriori with sorting algorithm so called the Load balancing Parallel mining method based on Apriori with Sorting (LPAS) is proposed. The main goal of the proposed algorithm is to reduce the massive duplicated candidates generated in previous method. Furthermore, this algorithm is performed better than previous methods. The experimental results showed that this method had dramatically reduced computation time with more threads. Moreover, it was observed that the workload was equally dispatched to each computing unit. Keywordsparallel data mining; apriori; load balancing; association rules

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment

Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...

متن کامل

A New Load Balancing Approach for Parallel FP-Growth

Due to the exponential growth in worldwide information, companies have to deal with an ever growing amount of digital information. So the huge size of data and computation volume of new processing applications such as data mining, leads to new high performance parallel processing systems. One of the most important challenges of such application is quickly and correctly finding the relationship ...

متن کامل

Application of Parallelized Apriori in Grid Computing Environment

The goal of the strategy is to improve the performance of distributed algorithms and better their responsiveness. The association rule mining algorithms has high computational complexity due to the size of its search space and the high demands of data access. The work aims at mining the data in a grid computing environment, which computes by distributing the data to its clusters and mines it in...

متن کامل

A Parallel MapReduce Algorithm to Efficiently Support Itemset Mining on High Dimensional Data

In today’s world, large volumes of data are being continuously generated by many scientific applications, such as bioinformatics or networking. Since each monitored event is usually characterized by a variety of features, highdimensional datasets have been continuously generated. To extract value from these complex collections of data, different exploratory data mining algorithms can be used to...

متن کامل

Parallel Performance of Adaptive Algorithms with Dynamic Load Balancing

Parallelization of adaptive algorithms leads to problems with parallel efficiency. Adaptation is a method which introduces dynamic perturbations to computational environment. This in turn causes problems with proper load balance. To ensure proper efficiency of a parallel simulation it is necessary to perform load balancing whenever certain threshold of load balance is breached. In this paper au...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013